NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains

Edelman, Benjamin L; Edelman, Ezra; Goel, Surbhi; Malach, Eran; Tsilivis, Nikolaos (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024))

Large language models have the ability to generate text that mimics patterns in their inputs. We introduce a simple Markov Chain sequence modeling task in order to study how this in-context learning (ICL) capability emerges. In our setting, each example is sampled from a Markov chain drawn from a prior distribution over Markov chains. Transformers trained on this task form \emph{statistical induction heads} which compute accurate next-token probabilities given the bigram statistics of the context. During the course of training, models pass through multiple phases: after an initial stage in which predictions are uniform, they learn to sub-optimally predict using in-context single-token statistics (unigrams); then, there is a rapid phase transition to the correct in-context bigram solution. We conduct an empirical and theoretical investigation of this multi-phase process, showing how successful learning results from the interaction between the transformer's layers, and uncovering evidence that the presence of the simpler unigram solution may delay formation of the final bigram solution. We examine how learning is affected by varying the prior distribution over Markov chains, and consider the generalization of our in-context learning of Markov chains (ICL-MC) task to n-grams for n is greater than 2.
more » « less
Full Text Available
Engineering Hybrid-Hydrogels Comprised of Healthy or Diseased Decellularized Extracellular Matrix to Study Pulmonary Fibrosis

https://doi.org/10.1007/s12195-022-00726-y

Saleh, Kamiel S.; Hewawasam, Rukshika; Šerbedžija, Predrag; Blomberg, Rachel; Noreldeen, Saif E.; Edelman, Benjamin; Smith, Bradford J.; Riches, David W.; Magin, Chelsea M. (October 2022, Cellular and Molecular Bioengineering)

Full Text Available
Learning From Strategic Agents: Accuracy, Improvement, and Causality, ICML

Shavit, Yonadav; Edelman, Benjamin; Axelrod, Brian (July 2020, Proceedings of Machine Learning Research)

In many predictive decision-making scenarios, such as credit scoring and academic testing, a decision-maker must construct a model that accounts for agents' incentives to ``game'' their features in order to receive better decisions. Whereas the strategic classification literature generally assumes that agents' outcomes are not causally dependent on their features (and thus strategic behavior is a form of lying), we join concurrent work in modeling agents' outcomes as a function of their changeable attributes. Our formulation is the first to incorporate a crucial phenomenon: when agents act to change observable features, they may as a side effect perturb unobserved features that causally affect their true outcomes. We consider three distinct desiderata for a decision-maker's model: accurately predicting agents' post-gaming outcomes (accuracy), incentivizing agents to improve these outcomes (improvement), and, in the linear setting, estimating the visible coefficients of the true causal model (causal precision). As our main contribution, we provide the first algorithms for learning accuracy-optimizing, improvement-optimizing, and causal-precision-optimizing linear regression models directly from data, without prior knowledge of agents' possible actions. These algorithms circumvent the hardness result of Miller et al. (2019) by allowing the decision maker to observe agents' responses to a sequence of decision rules, in effect inducing agents to perform causal interventions for free.
more » « less
Full Text Available
{SGD} on Neural Networks Learns Functions of Increasing Complexity

Kalimeris, Dimitris; Kaplun, Gal; Nakkiran, Preetum; Edelman, Benjamin L.; Yang, Tristan; Barak, Boaz; Zhang, Haofeng (December 2019, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019)
null; null; null; null; null; null (Ed.)
Full Text Available

Search for: All records